Variant Discovery ◾ 127
genotype is called and assigned to the sample, and positions with putative variants (substi-
tutions or InDels) are written into the VCF file.
In the following, we will perform variant calling with both FreeBayes and GATK, which
are examples of haplotype-based variant callers.
4.2.2.1 FreeBayes Variant Calling Pipeline
FreeBayes [7] is a haplotype-based and Bayesian variant detector that is used to detect small
variants such as single and multiple-nucleotide polymorphisms and InDels. On Linux, we
can install FreeBayes as follows:
sudo apt update
sudo apt install freebayes
Use the following command to read more about FreeBayes usage and options:
freebayes –help
For variant calling, we will use the following form:
freebayes \
-f ../ref/GCF_009858895.2_ASM985889v3_genomic.fna \
-C 5 \
-L bam_list.txt \
-v ../variants/sarscov2.vcf
The “-f” option specifies the reference file, “-C” specifies the minimum number of observa-
tions supporting an alternate allele within a single individual in order to evaluate the posi-
tion, “-L” will pass the name of the text file that contains the names of the BAM files (each
file name in a line), “-v” will pass the VCF file name.
We will use FreeBayes in the above example to identify variants in the SARS-CoV-2
samples. We will follow the same steps we did for “bcftools” above. First, we will create a
project directory and store the run IDs in a file “ids.txt” as above. Then, we will save the
following script in a file “pipeline_freebayes.sh” and execute it as “bash pipeline_freebayes.
sh”:
#!/bin/bash
#Sars-Cov2 variant calling
#-------------------------
#1- download fastq files from the NCBI SRA database
mkdir fastq
while read f;
do
fasterq-dump --progress --outdir fastq “$f”
done < ids.txt